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ABSTRACT 

The double-stranded DNA of the genome contains 
both sequence information directly relating to the 
protein and RNA coding as well as functional and 
structural information relating to protein recogni- 
tion. Only recently is the importance of DNA shape 
in this recognition process being fully appreciated, 
and it also appears that minor groove electronega- 
tive potential may contribute significantly in guiding 
proteins to their cognate binding sites in the gen- 
ome. Based on the photo-chemical probing results, 
we have derived an algorithm that predicts the 
minor groove electronegative potential in a DNA 
helix of any given sequence. We have validated 
this model on a series of protein-DNA binding sites 
known to involve minor groove electrostatic recog- 
nition as well as on stable nucleosome core com- 
plexes. The algorithm allows for the first time a full 
minor groove electrostatic description at the nu- 
cleotide resolution of any genome, and it is illu- 
strated how such detailed studies of this sequence 
dependent, inherent property of the DNA may reflect 
on genome organization, gene expression and 
chromosomal condensation. 

INTRODUCTION 

Proteins such as transcription factors and histones in mi- 
crosomes are pivotal in correct decoding of the genetic 
information of the double-stranded DNA helix by binding 
to specific sequences in the genomes thereby controlling 
proper gene activation. Sequence-specific protein binding 
is primarily accomplished through direct reading of the 
nucleobase sequence via specific protein nucleobase 
contacts predominantly in the DNA major groove (1). 
However, the base sequence also controls DNA double 
helix conformation and specific properties such as minor 
groove width and electronegative potential (2-A). It has 
been known for more than a decade that DNA 



recognition by many small ligands (5) is relying on 
minor groove shape and electrostatic potential, and it is 
also recognized that these features of the DNA helix can 
be critical for protein recognition (6-8). Nevertheless, only 
recently has a more general and detailed understanding of 
the importance of variations in minor groove electronega- 
tive potential for protein binding been documented (9-11). 
In particular, it has been suggested based on the crystal 
structure data that a large number of proteins may recog- 
nize and be guided to their binding sites on the DNA helix 
through specific arginines reading the electronegative po- 
tential in the minor groove (9). Thus, knowing the electro- 
negative potential along the DNA of the genome is 
important for a detailed understanding of the DNA 
function in terms of protein recognition. However, 
reliable techniques for directly probing and especially 
predicting minor groove electronegative potential have 
hitherto not been available. 

Minor groove width and electronegative potential of the 
DNA helix are clearly interconnected properties as a 
closer distance of the phosphates across the groove will 
increase the negative potential in the groove, but the 
electron distribution within the base pairs at the floor of 
the groove also plays a decisive role, as does the sugar 
conformation and thus the exact position of the phos- 
phates relative to the groove (2-^4). Therefore, minor 
groove width and electronegative potential are distinct 
but not independent helix parameters. 

We have previously demonstrated that the sequence de- 
pendence of photo-chemical cleavage of double-stranded 
DNA by the uranyl(VI) ion (V0 2 2 + ) reflects these DNA 
parameters (12-14). Mechanistically, we have presented 
evidence that the uranyl divalent cation bound to the 
phosphates of the backbone photo-oxidizes proximal de- 
oxyriboses (15). Thus, we proposed that uranyl photo- 
probing of duplex DNA in principal could be exploited 
to semi-quantitatively assess the minor groove width, but 
presumable more precisely the minor groove electronega- 
tive potential, along any DNA helix sequence (12,14). We 
argued that sensing groove width could be due to bis 
dentate coordination to opposite phosphates across the 
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groove, while electronegative potential sensing obviously 
would be caused by electrostatic attraction of the cationic 
(solvated) uranyl. In this study, we present further valid- 
ation that uranyl photo-cleavage analysis does indeed to a 
very large extent reflect DNA minor groove electronega- 
tive potential, and we offer an algorithm that with high 
accuracy allows prediction of an electronegative potential 
genome map at the nucleotide level. Thus in the present 
study, we discuss the uranyl probing data only with refer- 
ence to minor groove electronegative potential, but the 
close connection to minor groove width as discussed 
above should be considered throughout. 

MATERIAL AND METHODS 

The pentamer library, consisting of 7 members (94 nt in 
length), and the protein binding sites were cloned into the 
BamHI site of plasmid pUC19 by standard methods. Each 
clone member of the library contained a subset of all the 
1024 possible pentanucleotide sequences (Supplementary 
Figure SI AD and Supplementary Table SI). The 
pentamer library sequences were designed to contain a 



common internal control sequence (A-tracts) proximal to 
the terminal BamHI cloning sites, in order to aid data nor- 
malization. The uranyl cleavage peak areas of the flanking 
common BamHI sequences were used to normalize the 
data for the seven clone members. 

The binding sequences of 12 minor groove protein 
binding sequences were cloned in pUC 19 as two frag- 
ments each containing six binding sites. The two 
Drosophila Hox binding sequences were cloned as a 
single fragment in pUC 19 (Supplementary Table SI). 
All DNA fragments used in this study were 32 P-labelled 
at the 3'-end of the EcoR I or Hind III sites of pUC 19 
derivatives by use of standard techniques. 

The uranyl photo-cleavage was performed as previously 
described (12,13,16). A Molecular Dynamics Storm 860 
phosphorimager was used to collect data from phosphor 
storage screens. In order to quantify band intensities, we 
used the software SAFA package (Semi-Automated 
Footprinting Analysis) (17). 

The algorithm is using a sliding window of five bases 
(Figure ID). Each score vector of five scores (a-j) is found 
by matching the DNA pentamer in the uranyl 
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Figure 1. Uranyl cleavage pattern of a pentanucleotide library and model for minor groove potential prediction. (A) Global view of the relative 
nucleotide cleavage intensities (in arbitrary units) in a DNA fragment containing all 1024 combinations of pentanucleotide sequences. Both strands 
are represented. (B) Expansion of a region (rectangle) of the 1024 bp library from panel A. (See Supplementary Figure SI for details of all sequences 
in the library). (C) Principle of the analysis of uranyl cleavage and minor groove electronegative potential. A pentamer [example GATGC (X-strand)] 
contains three n x (T, G and C) bases connected with lines with the n y — 2 positions (C, T and A) on the lower y strand. (D) Algoritm used for 
predicting the uranyl cleavage and electronegative potential in a given DNA sequence (see 'Materials and Methods' section for details) (E) In order 
to obtain a relative value for the minor groove potential defined by the bases n x and n y — 2, the cleavage at the n x and n y — 2 positions were summed 
and the potential assigned to base n x — 1 in the sequence. 
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cleavage table (shown as histogram in Supplementary 
Figure 1A-D), which holds all 1024 possible pentamers. 
The two first scores are set to zero according to the 
n + (n — 2) algorithm (Figure 1C). The score vector is 
placed in a result matrix which has five rows and as 
many columns as there are bases in the sequence. The 
sliding window is then moved one base and the next 
score vector is placed on the second line in the result 
matrix below its pentamer. When the window has been 
shifted five times, the row number of the result matrix is 
reset and the sixth score vector will be inserted at the first 
fine in the result matrix, etc. The final score vector 
O'i — y%) is the sum of each column in the result matrix, 
which holds three scores, divided by three. Thus for pre- 
diction of values for the electronegative potential the sum 
of predicted uranyl cleavage scores for each n + (« — 2) 
base pairs are averaged and the value is assigned to the 
n — 1 base (Figure IE). 



RESULTS AND DISCUSSION 

Extensive studies on bent A-tract DNA have shown that 
sequence-dependent DNA conformation and thus also 
minor groove width and electrostatic potential require 
tetra/pentamer regions in order to be defined in terms of 
sequence dependence (18). Therefore, to derive at an ex- 
perimentally founded model for sequence based prediction 
of DNA minor groove electronegative potential, we 
decided to analyse the uranyl photo-cleavage of all 
possible pentanucleotide helixes as found in a previously 
published DNA fragment library of all 1024 possible 
pentanucleotides (19). As can be seen from the results 
(Figure 1A) more than 10-fold variation in cleavage inten- 
sity is observed along the fragment. In particular A/T rich 
regions are, as expected, generally most efficiently 
cleaved (Figure IB and Supplementary Figure S1A-D 
(13). However, no direct correlation between the size 
and simple sequence of the AT region and the cleavage 
is apparent. For instance sequences containing two or 
three contiguous A/T base pairs such as GAACT 
(Supplementary Figure S1A), GAAAC (Supplementary 
Figure SIB), GAACC (Supplementary Figure SIB) and 
GATAC (Supplementary Figure SIC and Figure IB) 
and even sequences without any A/T dinucleotide steps 
AGACT (Supplementary Figure SIB) and GGACA 
(Supplementary Figure SID) exhibit increased cleav- 
age while other seemingly analogous sequences such as 
CAATC (Supplementary Figure SI A), GAAGA 
(Supplementary Figure SI A), CAAAC (Supplementary 
Figure SIB) and CATAG (Supplementary Figure SIB) 
do not exhibit hyper-reactivity towards uranyl cleavage. 
Also in accordance with all previous results, the pos- 
itions of cleavage maxima and minima on the two DNA 
strands in general are consistent with binding/cleavage 
across the minor groove, exhibiting a 2nt stagger 
towards the 3'-end between the two strands (the shortest 
distance across the minor groove) (12-15) (Figure IB and 
Supplementary Figure S1A-D). Thus, within each 
pentamer three such minor groove n x — (n y — 2) 'base 
pairs' are defined (Figure 1C). Consequently, we designed 



an algorithm that divides a sequence in overlapping penta- 
mers (Figure ID). For each pentamer, the cleavage 
values for the first three 3'-bases on each strand is obtained 
from the experimental pentamer library cleavage data 
(Supplementary Figure SI). Thus, eventually each base 
position is assigned the sum of three cleavage values 
(Figure ID, example in Supplementary Figure S2). Often 
a quantitative asymmetry of the uranyl cleavage between 
the two strands is observed (13). Therefore the prediction 
of the relative minor groove electronegative potential is 
obtained as the sum of the (predicted) cleavage at 
position n x and position n y — 2, and is assigned to base 
position n — 1 at the centre of an imaginary line between 
the w v 'th and the n v — 2 nd phosphate (Figure IE). 

In order to validate the model both in terms of uranyl 
cleavage and minor groove electronegative potential pre- 
diction, we chose 14 well-described protein binding sites 
and a thoroughly analysed nucleosome sequence (9,10). 
All of the analysed protein binding sites involve arginine 
interactions in regions with a narrow minor groove and 
enhanced electronegative potential according to calcula- 
tions on the basis of 3D crystal structures found in the 
Protein Data Bank (PDB) (9,10). Comparison of the ex- 
perimentally obtained uranyl photo-cleavage pattern of 
these DNA regions with that predicted from our sliding 
window model algorithm, clearly validates the model in 
terms of cleavage prediction, although minor differences 
are seen in regions of less intense cleavage (Figure 2A and 
Supplementary Figure S3). Likewise the relative minor 
groove electronegative potential map calculated from the 
data as described above shows very good correspondence 
between the experimentally based and the algorithm pre- 
dicted values (Figure 2B). Furthermore, the relative minor 
groove electronegative potential map obtained from 
uranyl cleavage data corresponds excellently with that pre- 
viously calculated from crystal structure data (9,10), and 
thus with the experimentally determined positions of 
arginines in the protein-DNA complexes (Figure 2B). 
Consequently, we conclude that the algorithm provides a 
very valuable and reliable tool for relative, semi- 
quantitative prediction of DNA helix minor groove elec- 
tronegative potential solely based on DNA sequence. 
Furthermore, because the uranyl probing reflects the 
properties of free DNA in solution, we conclude that the 
minor groove features responsible for protein recognition 
through electrostatic binding via arginines (at least for 
the protein-DNA complexes analysed here) is a feature 
of the native DNA helix conformation and is not induced 
by protein binding. This conclusion is not possible based 
on crystal data on protein-DNA complexes. Thus these 
proteins may indeed find their cognate DNA target by 
electrostatic search of the helix minor groove. 

The strength of the approach is clearly illustrated by the 
analysis of the two Drosophila Hox binding sequences, the 
fsh250 and the variant fsh250 con sites, differing only by a 
single TA/AT base pair change. The two binding se- 
quences, containing the ATTAAT (fsh250) and ATTTA 
T (fsh250 con ) hexamer sequences, respectively, have been 
shown to mediate functional specificity through small dif- 
ferences in minor groove width and electrostatic potential 
as calculated from X-ray structures (10). Our prediction 
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clearly reflects this subtle difference in minor groove po- 
tential that allows two arginines and one histidine to bind 
in the broader region of a higher electronegative potential 
in the ATTAAT sequence, whereas only a single arginine 
binds in the ATTTAT sequence due to an elevated elec- 
tronegative potential only in the left part of the hexamer 
sequence (Figure 2B). Analogously, excellent correspond- 
ence between the uranyl data and crystallography results 
of minor groove potential and arginine positions were 
found for the motility gene repressor (MogR) (20), 
which binds to a long A/T region in which a TA step 
with at local widening of the minor groove produces a 
bifurcated electronegative potential where two arginines 
can bind. Indeed our analysis is fully consistent with the 
data for all 14 protein binding sites including the 
Drosophila Hox protein, UBX and the cofactor EXD, 
which binds by inserting an arginine into a very electro- 
negative pocket of the minor groove (21), for the DNA 
site recognized by the MATal-MCMl complex (22) (in 
which Arg 7 binds within a long electronegative region 
whereas Arg 4 binds in a region of a less electronegative 
potential), and for the mammalian OCT1-PORE complex 
in which two POU domains (23) binds to two A-tract 
regions possessing electronegative pockets for four argin- 
ines. These examples together with the results of the 
binding sites for Msx-1 (24), OCTl-Pou (25), Pitl (26), 
PhoB (27), MATa2-MCM 1 (28), HAP1-18 (29), Tc3 
transpotase (30) and the 434 repressor (31) (Figure 2B 
and Supplementary Figure S3) all clearly demonstrate 
that the predicted minor groove electronegative potential 
of free DNA provides a powerful prediction for potential 
positions of arginines in protein-DNA complexes. 

Analysis of the symmetrical DNA used for X-ray struc- 
ture determination of a nucleosome core complex (32) 
gives a slightly different picture. Although as found for 
the protein recognition sites, the correlation between ex- 
perimental and predicted uranyl cleavage (Figure 3A) (and 
thus the derived minor groove electronegative potential 
measure) (Figure 3B) is qualitatively very good, only a 
subset of the regions identified as having highly elec- 
tronegative (and also narrow) minor groove in free 
DNA appear so in the nucleosome core particle (32) 
(Figure 3B). Specifically, we note that there is very good 
correlation at nucleosome positions 68, 55, 45 and 38 at 
the outer part of the core DNA (one half of the symmet- 
rical DNA is shown) while very little correlation between 
the free DNA structure and the nucleosome structure is 
found in the central region. In particular, the highly elec- 
tronegative minor grooves at positions 32, 10 and 0 are 
virtually out of phase with the DNA wrapping around the 
nucleosome core. Not unexpectedly, this would imply that 
the conformation of this DNA is not fully predetermined 
for nucleosome wrapping, but that certain, presumably 
key regions (in this case positions 68, 55, 45 and 38), are 
nucleating the process and thus direct the final folding 
which by induced fit 'forces' the remaining DNA into 
the 10 bp regular phasing around the histone core. 
Clearly it would be biologically advantageous for evolu- 
tion to select nucleosomes of intermediate thermodynamic 
stability or nucleosomes with alternative positioning pref- 
erences as this would allow energetically less costly 



remodelling for gene activation, and sliding and unravell- 
ing during replication and transcription. Thus upon 
genomic wide analyses of minor groove electronegative 
potential employing dedicated algorithms it may become 
possible to predict preferred nucleosomal positioning as 
well as possible remodelled states in the genome. 
Furthermore, other patterns may signify functional 
regions such as promoters, enhancers, etc. 

This latter point is very clearly illustrated by the analysis 
of the divergent yeast GAL1 and GAL10 promoters, 
which share a common upstream activating region con- 
taining four binding sites for the GAL 4 protein. This 
region has been thoroughly characterized for nucleosome 
positioning in vivo (33,34). The minor groove electronega- 
tive potential map (Figure 3C) shows a very distinct dif- 
ference between the GAL4 binding region and the 
surrounding DNA, which contain specifically positioned 
nucleosomes. While, the nucleosome regions show major 
variations in electronegative potential, many within a 10 
bp period compatible with nucleosome predisposition, the 
GAL4 region shows only minimal variation which would 
indicate much lower propensity for forming nucleosomes. 
Furthermore analysis of a series of human promoters 
indicate a similar general pattern in which the regions 
around transcription start site exhibit significantly lower 
electronegative potential variation than the regions sur- 
rounding the promoters (Supplementary Figure S4). 

In view of recent uncertainty of the importance of 
sequence directing effects on nucleosome positioning 
(35,36), these results clearly would indicate a very 
pronounced influence of DNA helix properties (which 
are sequence instructed) on in vivo nucleosome positioning 
[at least for the GAL1-10 locus and also for (certain) 
human promoters]. Obviously, nucleosome positioning is 
more complicated than merely a question of electronega- 
tive potential. Nucleosome positioning and octamer 
binding to G/C-rich motifs (37,38) will not be predicted 
by the electronegative potential analysis, possibly because 
this binding is predominantly driven by DNA flexibility 
properties. Eventually a combination of two or more al- 
gorithms taking into account both groove potential as well 
as DNA flexibility may be required to fully predict ener- 
getically preferred nucelosome positioning. Nonetheless, 
the present algorithm for the first time allow this type of 
analyses of the influence and importance of DNA struc- 
ture in terms of minor groove electronegative potential on 
genome function and organization, and the results so far 
on promoters and nucleosome DNA clearly indicate that 
this parameter is important. Further comprehensive 
genome analyses will reveal the implications of this 
approach. 

Obviously, regions of high minor groove electronegative 
potential will in general also be AT-rich because the 
highest negative potential is found in connection with 
AT-tracts (12); but AT-rich regions [e.g. short AT 
(N>3) runs interrupted by single G/C base pairs] do 
not necessarily show high minor groove electronegative 
potential. Therefore, the electronegative potential predic- 
tion by the DNA structure — due to the molecular origin of 
this DNA property — is inherently sequence biased, but a 
simple sequence analysis without the algorithm will of 
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course not reveal the electronegative potential information 
at all. It is also important to emphasize that it is not the 
electronegative potential per se but the pattern along the 
DNA helix which is the target of such analyses. 

An algorithm based on the hydroxyl radical cleavage 
method has recently been developed for measuring local 
variations in DNA structure at single nucleotide reso- 
lution (19). Modulation in hydroxyl radical cleavage 
reflects the average helix structure in terms of solvent ac- 
cessibility of the deoxyribose in the minor groove. Thus 
the charge neutral hydroxyl radical may sense differences 
in groove width (and helix conformation) but hardly elec- 
tronegative potential per se. In contrast, the DNA inter- 
action of the cationic uranyl ion will be directly influenced 
by the local electrostatic potential of the DNA helix. 
Therefore as agued previously (13,39), information 
obtained by the two methods is not equivalent but 
rather complementary, and may be combined in order to 
obtain a full description of the DNA helix conformation 
and properties. Furthermore, variations in uranyl cleavage 
is much more pronounced than hydroxyl radical cleavage 
and therefore more sensitively reflects subtle differences in 
helix structure and property, in casu minor groove electro- 
negative potential. Thus a comparison of the two methods 
at the genomic level would be interesting in order to 
deduce the correlation between minor groove width and 
electronegative potential in genomes, and also to help 
unravelling the structural and molecular details of the con- 
nection between these two parameters of the DNA double 
helix. Finally, because the algorithm is based on pentamer 
data we recognize that (special) features of the DNA helix 
that are dependent on longer stretches of the helix may not 
be predicted by the algorithm, and that it therefore can be 
refined by incorporation of more experimental data or by 
expanding it to a heaxamer or heptamer format. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. The 
supplementary algorithm is available at: http://gastro 
.sund.ku.dk/nar/ 
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